Protein Data Bank (file Format)
   HOME

TheInfoList



OR:

The Protein Data Bank (PDB) file format is a textual file format describing the three-dimensional structures of molecules held in the
Protein Data Bank The Protein Data Bank (PDB) is a database for the three-dimensional structural data of large biological molecules, such as proteins and nucleic acids. The data, typically obtained by X-ray crystallography, NMR spectroscopy, or, increasingly, cry ...
. The PDB format accordingly provides for description and annotation of protein and nucleic acid structures including atomic coordinates, secondary structure assignments, as well as atomic connectivity. In addition experimental metadata are stored. The PDB format is the legacy file format for the
Protein Data Bank The Protein Data Bank (PDB) is a database for the three-dimensional structural data of large biological molecules, such as proteins and nucleic acids. The data, typically obtained by X-ray crystallography, NMR spectroscopy, or, increasingly, cry ...
which now keeps data on biological macromolecules in the newer
mmCIF Crystallographic Information File (CIF) is a standard text file format for representing crystallographic information, promulgated by the International Union of Crystallography (IUCr). CIF was developed by the IUCr Working Party on Crystallographic ...
file format.


History

The PDB file format was invented in 1976 as a human-readable file that would allow researchers to exchange protein coordinates through a database system. Its fixed-column width format is limited to 80 columns, which was based on the width of the computer punch cards that were previously used to exchange the coordinates.Berman, Helen M. "The protein data bank: a historical perspective." Acta Crystallographica Section A 64.1 (2007): 88-95. Through the years the file format has undergone many changes and revisions. , the most recent revision is 3.30.


Example

A typical PDB file describing a protein consists of hundreds to thousands of lines like the following (taken from a file describing the structure of a syntheti
collagen-like peptide
:
HEADER    EXTRACELLULAR MATRIX                    22-JAN-98   1A3I
TITLE     X-RAY CRYSTALLOGRAPHIC DETERMINATION OF A COLLAGEN-LIKE
TITLE    2 PEPTIDE WITH THE REPEATING SEQUENCE (PRO-PRO-GLY)
...
EXPDTA    X-RAY DIFFRACTION
AUTHOR    R.Z.KRAMER,L.VITAGLIANO,J.BELLA,R.BERISIO,L.MAZZARELLA,
AUTHOR   2 B.BRODSKY,A.ZAGARI,H.M.BERMAN
...
REMARK 350 BIOMOLECULE: 1
REMARK 350 APPLY THE FOLLOWING TO CHAINS: A, B, C
REMARK 350   BIOMT1   1  1.000000  0.000000  0.000000        0.00000
REMARK 350   BIOMT2   1  0.000000  1.000000  0.000000        0.00000
...
SEQRES   1 A    9  PRO PRO GLY PRO PRO GLY PRO PRO GLY
SEQRES   1 B    6  PRO PRO GLY PRO PRO GLY
SEQRES   1 C    6  PRO PRO GLY PRO PRO GLY
...
ATOM      1  N   PRO A   1       8.316  21.206  21.530  1.00 17.44           N
ATOM      2  CA  PRO A   1       7.608  20.729  20.336  1.00 17.44           C
ATOM      3  C   PRO A   1       8.487  20.707  19.092  1.00 17.44           C
ATOM      4  O   PRO A   1       9.466  21.457  19.005  1.00 17.44           O
ATOM      5  CB  PRO A   1       6.460  21.723  20.211  1.00 22.26           C
...
HETATM  130  C   ACY   401       3.682  22.541  11.236  1.00 21.19           C
HETATM  131  O   ACY   401       2.807  23.097  10.553  1.00 21.19           O
HETATM  132  OXT ACY   401       4.306  23.101  12.291  1.00 21.19           O
...
;HEADER, TITLE and AUTHOR records : provide information about the researchers who defined the structure; numerous other types of records are available to provide other types of information. ;REMARK records : can contain free-form annotation, but they also accommodate standardized information; for example, the REMARK 350 BIOMT records describe how to compute the coordinates of the experimentally observed multimer from those of the explicitly specified ones of a single repeating unit. ;SEQRES records : give the sequences of the three peptide chains (named A, B and C), which are very short in this example but usually span multiple lines. ;ATOM records : describe the coordinates of the atoms that are part of the protein. For example, the first ATOM line above describes the alpha-N atom of the first residue of peptide chain A, which is a proline residue; the first three floating point numbers are its x, y and z coordinates and are in units of
Ångström The angstromEntry "angstrom" in the Oxford online dictionary. Retrieved on 2019-03-02 from https://en.oxforddictionaries.com/definition/angstrom.Entry "angstrom" in the Merriam-Webster online dictionary. Retrieved on 2019-03-02 from https://www.m ...
s. The next three columns are the occupancy, temperature factor, and the element name, respectively. ;HETATM records : describe coordinates of hetero-atoms, that is those atoms which are not part of the protein molecule.


Molecular visualization software capable of displaying PDB files


3d Animation software capable of displaying PDB files


See also

*
Chemical file format A chemical file format is a type of data file which is used specifically to depicting molecular data. One of the most widely used is the chemical table file format, which is similar to ''Structure Data Format'' (SDF) files. They are text files ...
*
ScientificPython ScientificPython is an open source library of scientific tools for the Python programming language. Its development started in 1995. It has not been updated since October 1, 2014. The library includes * mathematical tools like ** Differentiatio ...
— provides an interface for
Python Python may refer to: Snakes * Pythonidae, a family of nonvenomous snakes found in Africa, Asia, and Australia ** ''Python'' (genus), a genus of Pythonidae found in Africa and Asia * Python (mythology), a mythical serpent Computing * Python (pro ...
* Software for molecular mechanics modeling


References

{{reflist


External links


PDB Format Guide
This is the current version (3.3) of the PDB format specification.
PDBML
A more recent, alternative XML-based file format for molecular coordinates.
The RCSB Protein Data Bank

Protein Data Bank in Europe
* The Molecular Modeling DataBase (MMDB) from
NCBI The National Center for Biotechnology Information (NCBI) is part of the United States National Library of Medicine (NLM), a branch of the National Institutes of Health (NIH). It is approved and funded by the government of the United States. The ...

The wwPDB remediation Project
from wwPDB
MakeMultimer
An online tool for expanding BIOMT records in PDB files
Molecules
iPad/iPhone App to display PDB files
Python Macromolecular Library (mmLib)
— a
Python Python may refer to: Snakes * Pythonidae, a family of nonvenomous snakes found in Africa, Asia, and Australia ** ''Python'' (genus), a genus of Pythonidae found in Africa and Asia * Python (mythology), a mythical serpent Computing * Python (pro ...
library capable of reading and writing PDB file formats Computational chemistry Chemical file formats Biological databases